The dd-gzip-split file size trick
=Introduction= This trick goes by a few names. It's most commonly used to get around file size limitations when backing up a hard disk image or rescuing your data from media that is about to die. It involves using three separate and common unix tools (together with piping) to take one huge monolithic set of data like a hard disk (or hard disk partition). The first tool, dd, is an old and trusted tool. It is used to dump your source data (like the media you are rescuing data from) into stdout. The second tool, gzip, is used to compress the data from dd on the fly and dump THAT out to its stdout. Finally, the split command is used split up that data from gzip into a series of files at a certain size. This is all done in one line like so: Use of dd Some will tell you that these days dd is only used for really arcane Unix dealings with mid-to-large amounts of data. While this is partly true, dd is as useful now as it ever was. If you understand Unix and piping well enough, then this part is relatively easy. As always with dd, PAY ATTENTION TO WHAT YOU ARE DOING! This cannot be stressed enough because dd '''does not ask you anything, it is not interactive. It's very easy to do a simple mistake like say, switching your ins and your outs, and totally hose your data. In the example above, only two of the usual arguments are used. The argument starting with "if=/" is supposed to mean "infile is...", this is more or less where you are getting your data from. Be careful when working with disks and partitions that you select the right place (I myself did the same, instead of reading from just the one partition I wanted, /dev/hda1 if memory serves, I omitted the "1" and accidentally told '''dd to read from the entire disk, which had three or four partitions at the time). The argument starting with "bs=" is the block size. This can be a number in bytes, kilobytes (k), megabytes (m, like in the example), or gigabytes (g). The absence of an outfile ("of=/") in most cases (like this one) means that dd will dump everything to stdout. Use of gzip Now this is a less common way of using gzip. When the media in question is larger than a few gigabytes, it would be prudent to use compression such as this. The absence of something to compress tells gzip to take data from stdin which is piped in from dd. The "-c" argument tells gzip to spit the compressed data right back out to stdout. Using this trick you will almost NEVER see any output from gzip, so, like always, PAY ATTENTION! Use of split Ah, the old and trustable split command. This is also unusual use. The "-b 1025m" argument tells split to split the incoming data (wherever it's coming from) into exact sizes when possible, in this case, 1025MB. The "-" out there by itself tells split that the data is coming from stdin and not some file. Finally the last argument, the location and beginning of file name, is generally where the resulting data is going. Note that there is a dot at the end of that location. This tells split to put a dot in the resulting file names. If this was not specified, instead of the files being named like "/root/whatev.img.gz.aj", it will turn out like "/root/whatev.img.gzaj". Depending on what you are doing, this may be important. =How to undo it= At some point, you will want to reassemble all the pieces. Again, we use a 3 commands and 2 pipes: Use of cat This is essentially the whole thing in reverse. Instead of split, cat is used, since with an asterisk we can recall the exact order all the files were in and all the data in them without any other concerns, and it all gets piped in to the next part. The rest of it This time when we use gzip we are reversing the flow from compression to decompression. More information on that is available in the gzip man page. Decompressed data is piped out to dd, which in this case takes data from stdin because no infile is specified. The outfile should be the media you're restoring your data to. Be aware both processes can take a very long time. =See Also= *michi blog: HDD or partition backup with dd *Unix DD Command and Image Creation *dd FreeBSD man page *UNIX man pages : gunzip